PySQL Connector split into core and non core part #444

jprakash-db · 2024-09-24T05:04:17Z

Description

databricks-sql-python library is split so that package size can be reduced for the end user based on their requirements
Particularly pyarrow is the heavy component that is planned to be kept optional.

Users View

So for end users, who just want to use the core functionality can use the databricks_sql_connector_core, which does not have pyarrow and thus will be much smaller in size. These users will primarily be interested in dealing with small sizes of data.
For the remaining users can continue using the package as it is
existing library split into

databricks-sql-connector ( This is kept, so that the existing users import flow does not change )
databricks-sql-connector-core ( This is the lightweight library that separates the core part )

Tasks Completed

Refractored the code into its respective folders based on the proposed design doc
pyproject.toml file has been changed to reflect the proper dependencies for the split
Made sure that all the existing e2e and units tests are working pre and post spit, ensuring parity
Added benchmarking queries to test the performance of pre and post split and a dashboard has been created for visualization
Dependency tests are also added to check how the library behaves when certain libraries are not available and the user requests their functions

How to Test

Testing pipeline remains the same as it is before the split.
pytest can be used to directly run both the integration as well as unit tests, by pytest [directory_name or file_name]

Performance Comparison - Benchmarking

The pre-split and post-split preformance comparison has been made using the large and small queries to make sure their is no regression of performance
Dashboard has been created so that everytime the benchmarking is run the result are stored in the benchfood, and comparisons can be made easily

…ore part (#417) * Implemented ColumnQueue to test the fetchall without pyarrow Removed token removed token * order of fields in row corrected * Changed the folder structure and tested the basic setup to work * Refractored the code to make connector to work * Basic Setup of connector, core and sqlalchemy is working * Basic integration of core, connect and sqlalchemy is working * Setup working dynamic change from ColumnQueue to ArrowQueue * Refractored the test code and moved to respective folders * Added the unit test for column_queue Fixed __version__ Fix * venv_main added to git ignore * Added code for merging columnar table * Merging code for columnar * Fixed the retry_close sesssion test issue with logging * Fixed the databricks_sqlalchemy tests and introduced pytest.ini for the sqla_testing * Added pyarrow_test mark on pytest * Fixed databricks.sqlalchemy to databricks_sqlalchemy imports * Added poetry.lock * Added dist folder * Changed the pyproject.toml * Minor Fix * Added the pyarrow skip tag on unit tests and tested their working * Fixed the Decimal and timestamp conversion issue in non arrow pipeline * Removed not required files and reformatted * Fixed test_retry error * Changed the folder structure to src / databricks * Removed the columnar non arrow flow to another PR * Moved the README to the root * removed columnQueue instance * Revmoved databricks_sqlalchemy dependency in core * Changed the pysql_supports_arrow predicate, introduced changes in the pyproject.toml * Ran the black formatter with the original version * Extra .py removed from all the __init__.py files names * Undo formatting check * Check * Check * Check * Check * Check * Check * Check * Check * Check * Check * Check * Check * Check * Check * BIG UPDATE * Refeactor code * Refractor * Fixed versioning * Minor refractoring * Minor refractoring

…ave pyarrow as optional

Print warning message if pyarrow is not installed Signed-off-by: Jacky Hu <jacky.hu@databricks.com>

Remove sqlalchemy and update README.md Signed-off-by: Jacky Hu <jacky.hu@databricks.com>

jprakash-db added 2 commits August 14, 2024 14:56

Modified the gitignore file to not have .idea file

9cb1ea3

jprakash-db requested a review from gopalldb September 24, 2024 05:04

jprakash-db self-assigned this Sep 24, 2024

jprakash-db requested review from rcypher-databricks, yunbodeng-db, andrefurlan-db, jackyhu-db, benc-db and kravets-levko as code owners September 24, 2024 05:04

Changed the folder structure such that sqlalchemy has not reference here

a022590

jprakash-db had a problem deploying to azure-prod September 25, 2024 17:11 — with GitHub Actions Failure

jprakash-db added 2 commits October 8, 2024 12:15

Fixed README.md and CONTRIBUTING.md

af47301

Added manual publish

64b2818

jprakash-db had a problem deploying to azure-prod October 8, 2024 19:02 — with GitHub Actions Failure

On push trigger added

44b52ac

jprakash-db had a problem deploying to azure-prod October 8, 2024 19:28 — with GitHub Actions Failure

Manually setting the publish step

8db3fd0

jprakash-db had a problem deploying to azure-prod October 8, 2024 19:34 — with GitHub Actions Failure

Changed versioning in pyproject.toml

3d1ef79

jprakash-db had a problem deploying to azure-prod October 17, 2024 05:22 — with GitHub Actions Failure

Bumped up the version to 4.0.0.b3 and also changed the structure to h…

ee7f1e3

…ave pyarrow as optional

jprakash-db had a problem deploying to azure-prod November 6, 2024 08:04 — with GitHub Actions Failure

Removed the sqlalchemy tests from integration.yml file

608d237

jprakash-db temporarily deployed to azure-prod November 11, 2024 17:07 — with GitHub Actions Inactive

[PECO-1803] Print warning message if pyarrow is not installed (#468)

85af9c0

Print warning message if pyarrow is not installed Signed-off-by: Jacky Hu <jacky.hu@databricks.com>

jprakash-db temporarily deployed to azure-prod November 13, 2024 04:48 — with GitHub Actions Inactive

[PECO-1803] Remove sqlalchemy and update README.md (#469)

38ffa95

Remove sqlalchemy and update README.md Signed-off-by: Jacky Hu <jacky.hu@databricks.com>

jprakash-db temporarily deployed to azure-prod November 13, 2024 05:12 — with GitHub Actions Inactive

Removed all sqlalchemy related stuff

6ce555a

jprakash-db temporarily deployed to azure-prod November 13, 2024 05:47 — with GitHub Actions Inactive

generated the lock file

87b1251

jprakash-db deployed to azure-prod November 13, 2024 05:49 — with GitHub Actions Active

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PySQL Connector split into core and non core part #444

PySQL Connector split into core and non core part #444

jprakash-db commented Sep 24, 2024 •

edited

Loading

PySQL Connector split into core and non core part #444

Are you sure you want to change the base?

PySQL Connector split into core and non core part #444

Conversation

jprakash-db commented Sep 24, 2024 • edited Loading

Related Links

Description

Users View

Tasks Completed

How to Test

Performance Comparison - Benchmarking

jprakash-db commented Sep 24, 2024 •

edited

Loading